home *** CD-ROM | disk | FTP | other *** search
- Asynchronous ARM, by Steve Furber
-
- The async ARM is probably the only ARM related activity which isn't covered
- by an NDA! This is pure university research, with no plans for commercial
- exploitation at present. ARM Ltd is very supported and interested, but
- as you would expect they are waiting to see what the technology does before
- building any business plans around it.
-
- There was a good article in January 93 Byte on our work, and I will be
- presenting a paper at VLSI '93 on the architecture of the design. We haven't
- got much else onto paper yet, but material is beginning to come together.
- We will generate some full reports when we have seen silicon (in a couple
- of months). Below I append a summary submission I made to 'Hot Chips' in
- Stanford, which was accepted for a presentation this summer:
-
- AMULET1 - An Asynchronous ARM Processor
- =======================================
-
- A fully asynchronous implementation of the ARM microprocessor has
- been developed using Sutherland's "Micropipeline" approach. The
- design incorporates a number of concurrent units which cooperate
- to give instruction level compatibility with the existing synchronous
- part. These include an Address unit, which autonomously generates
- instruction fetch requests and interleaves (non-deterministically)
- data requests from the Execution unit; a Register file which sources
- operands, queues write destinations and handles data dependencies;
- an Execution unit which includes a multiplier, a shifter and an
- ALU with data-dependent delay; a Data interface which performs byte
- extraction and alignment and includes an instruction prefetch buffer,
- and a control path which performs instruction decode. These units
- all operate independently, only synchronizing at mutual interfaces
- to exchange data.
-
- The design demonstrates that all the usual problems of processor
- design can be solved in this asynchronous framework: backwards
- instruction set compatibility, interrupts and exact exceptions for
- memory faults are all covered. It also demonstrates some unusual
- behaviour, for instance non-deterministic prefetch depth beyond
- a branch instruction (though the instructions which actually get
- executed are, of course, deterministic). There are some unusual
- problems for compiler optimization, as the metric which must be
- used to compare alternative code sequences is continuous rather
- than discrete, and the non-determinism in external behaviour must
- also be taken into account.
-
- The chip (which is presently in fabrication) was designed using a
- mixture of custom datapath and compiled control logic elements, as
- was the synchronous ARM. The fabrication technology is the same as
- that used for one version of the synchronous part, reducing the
- number of variables when comparing the two parts.
-
- The macrocell size (without pad ring) is 5.5mm by 4.5mm on a 1 micron
- CMOS process, which is about twice the area of the synchronous part.
- Some of the increase can be attributed to the more sophisticated
- organization of the new part: it has a deeper pipeline than the
- clocked version, and it supports multiple outstanding memory requests.
- There is undoubtedly some overhead attributable to the asynchronous
- control logic, but we estimate this to be closer to 20% than to the
- 100% suggested by the direct comparison.
-
- The performance of the chip has been simulated at around 20K dhrystones,
- which is comparable to the synchronous part. This is based on compiler
- output which takes no note of data dependencies between instructions
- (the performance of the synchronous part is unaffected by instruction
- order), so we expect to be able to improve on this considerably by
- code re-ordering. The first design is very conservative in its timing,
- as there is no equivalent to backing-off on the clock frequency if the
- samples don't meet the design speed, so again we see considerable room
- for improvement through reducing the engineering margins.
-
- Tests on the first silicon should enable us to refine the above results
- before the Symposium takes place. The work has taken place as part of
- a broad ESPRIT funded investigation into low-power technologies within
- the European Open Microprocessor systems Initiative (OMI) programme,
- where there is interest in low-power techniques both for portable
- equipment and (in the longer term) to alleviate the problems of the
- increasingly high dissipation of high-performance chips. This initial
- investigation into the role asynchronous logic might play in the quest
- for lower power has now demonstrated through simulation (and shortly
- through silicon) that asynchronous techniques can be applied to problems
- of the scale of a complete microprocessor.
-
-
- I hope this gives you some of what you want.
-
- ---Steve
-
- --------------------------------------------------------------------
- S B Furber tel: (+44) 61 275 6129
- ICL Professor of Computer Engineering fax: (+44) 60 275 6202
- The University email: sfurber@cs.man.ac.uk
- Oxford Road
- Manchester M13 9PL
- UK
- --------------------------------------------------------------------
-
-
-